Goto

Collaborating Authors

 Cayuga County


Sample Efficient Demonstration Selection for In-Context Learning

Purohit, Kiran, Venktesh, V, Bhattacharya, Sourangshu, Anand, Avishek

arXiv.org Artificial Intelligence

The in-context learning paradigm with LLMs has been instrumental in advancing a wide range of natural language processing tasks. The selection of few-shot examples (exemplars / demonstration samples) is essential for constructing effective prompts under context-length budget constraints. In this paper, we formulate the exemplar selection task as a top-m best arms identification problem. A key challenge in this setup is the exponentially large number of arms that need to be evaluated to identify the m-best arms. We propose CASE (Challenger Arm Sampling for Exemplar selection), a novel sample-efficient selective exploration strategy that maintains a shortlist of "challenger" arms, which are current candidates for the top-m arms. In each iteration, only one of the arms from this shortlist or the current topm set is pulled, thereby reducing sample complexity and, consequently, the number of LLM evaluations. Furthermore, we model the scores of exemplar subsets (arms) using a parameterized linear scoring function, leading to stochastic linear bandits setting. CASE achieves remarkable efficiency gains of up to 7x speedup in runtime while requiring 7x fewer LLM calls (87% reduction) without sacrificing performance compared to state-of-the-art exemplar selection methods. We release our code and data at https://github.com/kiranpurohit/CASE


WavePulse: Real-time Content Analytics of Radio Livestreams

Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay

arXiv.org Artificial Intelligence

Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.


Fine-resolution landscape-scale biomass mapping using a spatiotemporal patchwork of LiDAR coverages

Johnson, Lucas K., Mahoney, Michael J., Bevilacqua, Eddie, Stehman, Stephen V., Domke, Grant, Beier, Colin M.

arXiv.org Artificial Intelligence

Estimating forest AGB at large scales and fine spatial resolutions has become increasingly important for greenhouse gas accounting, monitoring, and verification efforts to mitigate climate change. Airborne LiDAR is highly valuable for modeling attributes of forest structure including AGB, yet most LiDAR collections take place at local or regional scales covering irregular, non-contiguous footprints, resulting in a patchwork of different landscape segments at various points in time. Here, as part of a statewide forest carbon assessment for New York State (USA), we addressed common obstacles in leveraging a LiDAR patchwork for AGB mapping at landscape scales, including selection of training data, the investigation of regional or coverage specific patterns in prediction error, and map agreement with field inventory across multiple scales. Three machine learning algorithms and an ensemble model were trained with FIA field measurements, airborne LiDAR, and topographic, climatic and cadastral geodata. Using a strict set of plot selection criteria, 801 FIA plots were selected with co-located point clouds drawn from a patchwork of 17 leaf-off LiDAR coverages (2014-2019). Our ensemble model was used to produce 30 m AGB prediction surfaces within a predictor-defined area of applicability (98% of LiDAR coverage), and the resulting AGB maps were compared with FIA plot-level and areal estimates at multiple scales of aggregation. Our model was overall accurate (% RMSE 22-45%; MAE 11.6-29.4 Mg ha$^{-1}$; ME 2.4-6.3 Mg ha$^{-1}$), explained 73-80% of field-observed variation, and yielded estimates that were consistent with FIA's design-based estimates (89% of estimates within FIA's 95% CI). We share practical solutions to challenges faced in using spatiotemporal patchworks of LiDAR to meet growing needs for AGB mapping in support of applications in forest carbon accounting and ecosystem.


Optimal Control of Complex Systems through Variational Inference with a Discrete Event Decision Process

Dong, Wen, Liu, Bo, Yang, Fan

arXiv.org Artificial Intelligence

Complex social systems are composed of interconnected individuals whose interactions result in group behaviors. Optimal control of a real-world complex system has many applications, including road traffic management, epidemic prevention, and information dissemination. However, such real-world complex system control is difficult to achieve because of high-dimensional and non-linear system dynamics, and the exploding state and action spaces for the decision maker. Prior methods can be divided into two categories: simulation-based and analytical approaches. Existing simulation approaches have high-variance in Monte Carlo integration, and the analytical approaches suffer from modeling inaccuracy. We adopted simulation modeling in specifying the complex dynamics of a complex system, and developed analytical solutions for searching optimal strategies in a complex network with high-dimensional state-action space. To capture the complex system dynamics, we formulate the complex social network decision making problem as a discrete event decision process. To address the curse of dimensionality and search in high-dimensional state action spaces in complex systems, we reduce control of a complex system to variational inference and parameter learning, introduce Bethe entropy approximation, and develop an expectation propagation algorithm. Our proposed algorithm leads to higher system expected rewards, faster convergence, and lower variance of value function in a real-world transportation scenario than state-of-the-art analytical and sampling approaches.


Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Zhang, Wen, Paudel, Bibek, Zhang, Wei, Bernstein, Abraham, Chen, Huajun

arXiv.org Artificial Intelligence

Knowledge graph embedding aims to learn distributed representations for entities and relations, and is proven to be effective in many applications. Crossover interactions --- bi-directional effects between entities and relations --- help select related information when predicting a new triple, but haven't been formally discussed before. In this paper, we propose CrossE, a novel knowledge graph embedding which explicitly simulates crossover interactions. It not only learns one general embedding for each entity and relation as most previous methods do, but also generates multiple triple specific embeddings for both of them, named interaction embeddings. We evaluate embeddings on typical link prediction tasks and find that CrossE achieves state-of-the-art results on complex and more challenging datasets. Furthermore, we evaluate embeddings from a new perspective --- giving explanations for predicted triples, which is important for real applications. In this work, an explanation for a triple is regarded as a reliable closed-path between the head and the tail entity. Compared to other baselines, we show experimentally that CrossE, benefiting from interaction embeddings, is more capable of generating reliable explanations to support its predictions.